Mapping Snakemake

Check Validity of Sample_ids

check whether sample IDs contain '.' or not

Check Validity of RNA_types

check whether RNA types contain '.' or not

Parameter Settings

ds ds
data_dir path to .fastq files
data_dir path to .fastq files
rna_types map to different types of RNA indexes
adaptor reads have adaptors, and software cut them off provided with sequences
min_read_length filter too short reads
genome_dir
max_read_length filter too long reads
min_base_quality base quality control
temp_dir store temporary files

mapping statistics

read_counts_raw

Count reads in .fastq files of raw data

wc -l < {input} | awk '{{print int($0/4)}}' > {output}

read_counts_mapped

Count reads in .bam files of mapped reads

bamtools count -in {input} > {output}

read_counts_unmapped

Count reads in .fa.gz files of mapped reads

pigz -p {threads} -d -c {input} | wc -l | awk '{{print int($0/2)}}' > {output}

summarize_read_counts

mapped_read_length

run python script to count reads length of different .bam files as outputs of sequential mapping

bin/statistics.py read_length_hist --max-length 600 -i {input} -o {output}

merge_mapped_read_length

fastqc

fastqc of raw data

parse_fastqc_data

summarize_fastqc_ipynb

summarize_fastqc_html

cutadapt

cutadapt: cutadapt removes adapter sequences from high-throughput sequencing reads.

cutadapt -a {params.adaptor} -m {params.min_read_length} --trim-n -q {min_base_quality}          --too-short-output >(pigz -c -p {threads} > {output.too_short}) -o {output.trimmed} {input}

fastq_to_fasta

Change file attributes to remove quality information

tbam_to_gbam

convert transcript coordinate BAM alignments file into a genomic coordinate BAM alignments file

rsem-tbam2gbam {params.index} {input.bam} 

sort_gbam

samtools sort {input} > {output.bam}
samtools index {output.bam}

gbam_to_bedgraph

gbedgraph_to_bigwig

sort_tbam

samtools sort -T {params.temp_dir} -o {output} {input}

collect_alignment_summary_metrics

Produces a summary of alignment metrics from a SAM or BAM file.

picard CollectAlignmentSummaryMetrics I={input} O={output}

count_reads_intron

Provided with .bed file containing intron loci and other.bam file containing reads mapped to hg38, report overlaps to retrieve intron stats.

bedtools intersect -wa -s -a {input.bam} -b {input.bed} | wc -l > {output}

count_reads_promoter

Provided with .bed file containing promoter loci and other.bam file containing reads mapped to hg38, report overlaps to retrieve promoter stats.

count_reads_enhancer

Provided with .bed file containing enhancer loci and other.bam file containing reads mapped to hg38, report overlaps to retrieve enhancer stats.

map_circRNA

The software aligns unmapped reads to cicrRNA index ...

pigz -d -c other.fa.gz | bowtie2 -f -p {threads} --norc --sensitive --no-unal --un-gz circRNA.aligner.fa.gz -x circRNA - -S - | bin/preprocess.py filter_circrna_reads --filtered-file >(samtools view -b -o {output.bam_filtered}) | samtools view -b -o {output.bam}